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Abstract 

In analogy to the well-known notion of finite-state compressibility of individual sequences, 
due to Lempel and Ziv, we define a similar notion of "finite-state encryptability" of an indi- 
vidual plain-text sequence, as the minimum asymptotic key rate that must be consumed by 
finite-state encrypters so as to guarantee perfect secrecy in a well-defined sense. Our main 
basic result is that the finite-state encryptability is equal to the finite-state compressibility for 
every individual sequence. This is in parallelism to Shannon's classical probabilistic counterpart 
result, asserting that the minimum required key rate is equal to the entropy rate of the source. 
However, the redundancy, defined as the gap between the upper bound (direct part) and the 
lower bound (converse part) in the encryption problem, turns out to decay at a different rate 
(in fact, much slower) than the analogous redundancy associated with the compression problem. 
We also extend our main theorem in several directions, allowing: (i) availability of side informa- 
tion (SI) at the encrypter/decrypter /eavesdropper, (ii) lossy reconstruction at the decrypter, and 
(iii) the combination of both lossy reconstruction and SI, in the spirit of the Wyner-Ziv problem. 

Index Terms: Information-theoretic security, Shannon's cipher system, secret key, perfect se- 
crecy, individual sequences, finite-state machine, compressibility, incremental parsing, Lempel- 
Ziv algorithm, side information. 

1 Introduction 

The paradigm of individual sequences and finite-state machines (FSMs), as an alternative to the 
traditional probabilistic modeling of sources and channels, has been studied and explored quite 
extensively in several information-theoretic problem areas, including data compression [5], [13], 
[14], [18], [21], [24], [26], [27], [30], source/channel simulation [9], [15], classification [29], [31], 
*This research was supported by ISF grant no. 208/08. 
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prediction [2], [3], [12] [20], [22], [32], denoising [19], and even channel coding [8], [17], just to name 
very few representative references out of many more. On the other hand, it is fairly safe to say that 
the entire literature on information — theoretic security, starting from Shannon's classical work [16] 
and ending with some of the most recent work in this problem area (see, e.g., [4], [6], [7], [10], [23] 
for surveys as well as references therein), is based exclusively on the probabilistic setting. 

To the best of our knowledge, the only exception to this rule is an unpublished memorandum 
by Ziv [25]. In that work, the plain-text source to be encrypted, using a secret key, is an individual 
sequence, the encrypter is a general block encoder, and the eavesdropper employs an FSM as a 
message discriminator. Specifically, it is postulated in [25] that the eavesdropper may have some 
prior knowledge about the plain-text that can be expressed in terms of the existence of some set of 
"acceptable messages" that constitutes the a-priori level of uncertainty (or equivocation) that the 
eavesdropper has concerning the plain-text message: The larger the acceptance set, the larger is the 
uncertainty. Next, it is assumed that there exists an FSM that can test whether a given candidate 
plain-text message is acceptable or not: If and only if the FSM produces the all-zero sequence in 
response to that message, then this message is acceptable. Perfect security is then defined as a 
situation where the size of the acceptance set is not reduced (and hence neither is the uncertainty) 
in the presence of the cryptogram. The main result in [25] is that the asymptotic key rate needed 
for perfectly secure encryption in that sense, cannot be smaller (up to asymptotically vanishing 
terms) than the LempeFZiv (LZ) complexity of the plain-text source [30]. This lower bound is 
obviously asymptotically achieved by one-time pad encryption of the bit-stream obtained by LZ 
data compression of the plain-text source. This is in parallelism to Shannon's classical probabilistic 
counterpart result, asserting that the minimum required key rate is equal to the entropy rate of 
the source. 

In this paper, we also consider encryption of individual sequences, but our modeling approach 
and the definition of perfect secrecy are substantially different. Rather than assuming that the 
encrypter and decrypter have unlimited resources, and that it is the eavesdropper which has limited 
resources, modeled in terms of FSMs, in our setting, the converse is true. We adopt a model of 
a finite-state encrypter, which receives as inputs the plain-text stream and the secret key bit- 
stream, and it produces a cipher-text, while the internal state variable of the FSM, that designates 
limited memory of the past plain-text, is evolving in response to the plain-text input. Based on 
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this model, we define a notion of finite-state encryptability (in analogy to the notions of finite- 
state compressibility [30] and the finite-state predictability [2]), as the minimum achievable rate 
at which key bits must be consumed by any finite-state encrypter in order to guarantee perfect 
security against an unauthorized party, while keeping the cryptogram decipherable at the legitimate 
receiver, which has access to the key. Our main result is that the finite-state encryptability is equal 
to the finite-state compressibility, similarly as in [25]. 

More precisely, denoting by c{x n ) the number of LZ phrases associated with the plain-text 
x n = (xi, . . . , x n ), we show that number of key bits required by any encrypter with s states, 
normalized by n (i.e., the key rate), cannot be smaller than [c(x n ) log c(x n )]/n — S s (n), where 
8 s (n) = 0(s log(log n) / \/log n) . On the other hand, this bound is obviously essentially achievable 
by applying the LZ '78 algorithm [30], followed by one-time pad encryption (i.e., bit-by-bit XORing 
between compressed bits and key bits), since the compression ratio of the LZ '78 algorithm is also 
[c(x n ) log c(x n )]/n, up to vanishingly small terms. It follows then that the finite-state encryptability 
of every (infinite) individual sequence is equal to its finite-state compressibility. 

While the idea of LZ data compression, followed by one-time padding is rather straightforward, 
our main result, that no finite-state encrypter can do better than that for any given individual 
sequence, may not be quite obvious since the operations of compression and encryption are basically 
different - secret key encryption need not necessarily be based on compression followed by one-time 
padding, definitely not if both operations are formalized in the framework of finite-state machines. 

For finite sequences of length n, the difference between the upper bound (of the direct part) and 
the lower bound (of the converse part), which can be thought of as some notion of redundancy, is 
again 0(s log(log n) / ^J\og n) , which decays much more slowly than the corresponding redundancy 
in data compression [30, Theorems 1, 2], which is roughly 0((logs)/logn). 

Finally, we extend our main basic theorem in two directions, first, one at a time, and then 
simultaneously. The first extension is in allowing availability of side information (SI) at all three 
parties (encrypter, legitimate decrypter and eavesdropper) or at the decrypter and the eavesdropper 
only. We assume that the SI sequence is an individual sequence as well. We also assume that it 
is the same SI that is available to all three parties in the first case or to both the legitimate 
decrypter and the eavesdropper, in the second case. Extensions to situations of different versions 
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of the SI at different users is deferred to the last step, which will possess the most general scenario 
we study in this work. Our main result is essentially unaltered, except that the LZ complexity, 
PLz(x n ) = [c(x n ) log c(x n )]/n, is replaced by the conditional LZ complexity given the SI, to be 
defined later (see also [11], [28]). Our second extension is to the case where lossy reconstruction is 
allowed at the legitimate receiver (first, without SI). Here the LZ complexity is replaced by a notion 
of "LZ rate-distortion function," riz(D; x n ), which means the smallest LZ complexity among all 
sequences that are within the allowed distortion relative to the input plain-text sequence. While our 
framework allows randomized reconstruction sequences (that may depend on the random key), we 
find that at least asymptotically, there is nothing to gain from this degree of freedom, as optimum 
performance can be achieved by a scheme that generates deterministic reproductions. Finally, we 
allow both SI and lossy reconstruction at the same time. Moreover, every party might have access to 
a different version of the SI. The SI available to the legitimate receiver is assumed to be generated 
by the plain-text source via a known memoryless channel. Here we no longer characterize the 
performance in terms LZ complexities of sequences, but rather in the same spirit of the Wyner-Ziv 
rate-distortion function for individual sequences using finite-state encoders and decoders [13]. 

It should be pointed out that throughout the entire paper, most of our emphasis is on converse 
theorems (lower bounds). The compatible direct parts (upper bounds) will always be attainable 
by a straightforward application of the suitable data compression scheme, followed by one-time 
padding. 

The outline of the remaining part of this paper is as follows. In Section 2, we establish some 
notation conventions and we formally define the model and the problem. In Section 3, we assert 
and prove the main result. Finally, in Section 4, we extend our results in the above-mentioned 
directions, and we point out how exactly the proof of the basic theorem should be modified in each 
case in order to support our assertions. 

2 Notation Conventions and Problem Formulation 

We begin by establishing some notation conventions. Throughout this paper, scalar random vari- 
ables (RV's) will be denoted by capital letters, their sample values will be denoted by the respective 
lower case letters, and their alphabets will be denoted by the respective calligraphic letters. A sim- 
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ilar convention will apply to random vectors and their sample values, which will be denoted with 
same symbols superscripted by the dimension. Thus, for example, A m (m - positive integer) will 
denote a random m-vector (A\, ...,A m ), and a m = (cti, ...,a m ) is a specific vector value in A m , the 
m-th Cartesian power of A. The notations a\ and A\, where i and j are integers and i < j, will 
designate segments (a^, . . . , a,j) and (Ai, . . . ,Aj), respectively, where for i = 1, the subscript will be 
omitted (as above). For i > j, a\ (or A\) will be understood as the null string. 

Sources and channels will be denoted generically by the letter P or Q, subscripted by the 
name of the RV and its conditioning, if applicable, exactly like in ordinary textbook notation 
standards, e.g., Px^(x m ) is the probability function of X m at the point X m = x m , Py\r\s m {M sm ) 
is the conditional probability of W = w given S m = s m , and so on. Whenever clear from the 
context, these subscripts will be omitted. Information theoretic quantities, like entropies and 
mutual informations, will be denoted following the usual conventions of the information theory 
literature, e.g., H(K m ), I(W;X m \S m ), and so on. 

A finite-state encrypter is defined by a sixtuplet E = (X,y,Z,f,g,A), where X is a finite 
input alphabet of size \X\ = a, y is a finite set of binary words, Z is a finite set of states, 
/ : Z x X x {0, 1}* — > y is the output function, g : Z x X — > Z is the next-state function, 
A : Z x X — > {0, 1,2,.. .}, and {0, 1}* is the set of all binary strings of finite length. The set y is 
allowed to contain binary strings of various lengths, including the null word A (whose length is zero). 
When two infinite sequences, x = xi,xz,..., Xi G X, henceforth the plain-text sequence (or, the 
source sequence), and u = u±,U2, . . ., Uj € {0, 1}, z = 1,2,..., henceforth the key sequence, are fed 
into an encrypter E, it produces an infinite output sequence y = y±, y2, ■ ■ ., y% £ y, henceforth the 
cryptogram, while passing through an infinite sequence of states z = z±, z-i, ■ ■ ., Zi € Z, according 
to the following recursive equations, implemented for i = 1, 2, . . . 

U = U-i + A(zi,Xi), t = (1) 

ki = (u ti _ 1+ i,u ti _ 1+ 2,...,u ti ) (2) 

Vi = f(zi,Xi,ki) (3) 

Zi+i = 9(zi,Xi) (4) 

where it is understood that if A(zi,Xi) = 0, then ki = A, the null word of length zero, 1 namely, no 
1 Note that the evolution of the state Zi depends only on the source inputs {xi}, not on the key bits. The rationale 
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key bits are used in the i-th step. By the same token, if y» = A, no output is produced at this step, 
i.e., the system is idling and only the state evolves in response to the input. An encrypter with s 
states, or an s-state encrypter, E, is one with \Z\ = s. It is assumed that the plain-text sequence 
x is deterministic (i.e., an individual sequence), whereas the key sequence u is purely random, i.e., 
for every positive integer n, Pfjn(u n ) = 2~ n . 

A few additional notation conventions will be convenient: By f(z\,x n ,k n ), we refer to the 
vector y n produced by E in response to the inputs x n and k n when the initial state is z\. Similarly, 
the notation g(z\, x n ) will mean the state z n+ \ and A(zi,x n ) will designate Ya=i ^-{ z ii x i) under 
the same circumstances. An encrypter E is said to be perfectly secure if for every two positive 
integers n, m (m > n) and for every x G X°° and y™ G y m -n+i ^ the probability Pr{YJ£™ = y™\x} 
is independent of x. 

An encrypter is referred to as information lossless (IL) if for every z\ G 2, every sufficiently 
large 2 n and all x n G X n and k n G /C n , the quadruple (z±, k n , f{z\, x n , k n ),g(zi, x n )) uniquely 
determines x n . It will henceforth be assumed, without loss of generality, that z\ is a certain fixed 
member of Z. Given an encrypter E and an input string x n , the encryption key rate of x n w.r.t. 
E is defined as 



where £(ki) = A(zi,Xi) is the length of the binary string k{ and £(k n ) = YH=iK^i) ls the total 
length of k n . 

The set of all perfectly secure, IL encrypters {E} with no more than s states will be denoted 
by £ (s). The minimum of aE(x n ) over all encrypters in £(s) will be denoted by a s (x n ), i.e., 



is that the role of Zi is to store past memory of the information sequence x n , in order to take advantage of empirical 
correlations and repetitive patterns in that sequence, whereas memory of past key bits, which are i.i.d., is irrelevant. 
Nonetheless, it is possible to extend the encrypter model to have two separate state variables, one evolving with 
dependence on {xi} only (as above) and one with dependence on both {xi} and {fci}, where the former state variable 
plays a role in the update of ti and the latter plays a role in the output function. 

2 It should be pointed out that this definition of information losslessness is more relaxed (and hence more general) 
than the definition in [30]. While in [30], the requirement is imposed for every positive integer n, here it is required 
only for all sufficiently large n. Note that lack of information losslessness in the more restrictive sense of [30] is not in 
contradiction with the ability to reconstruct the source at the legitimate decoder. All it means is that reconstruction 
of x n may require more information than just (zi, y n , k n , z n +i), for example, some additional data from times later 
than n + 1 may be needed. 




(5) 





(6) 
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Finally, let 

a s (x) = limsupcr s (x ra ), (7) 

n— >oo 

and define the finite-state encryptability of x as 

a(x) = lim a s (x). (8) 

s— >oo 

Our purpose it to characterize these quantities and to point out how they can be achieved in 
principle. 

3 Main Result 

Incremental parsing [30] of a string x n is a sequential procedure of parsing x n into distinct phrases, 
where each new parsed phrase is the shortest string that has not been encountered before as a 
phrase of x n , with the possible exception of the last phrase that might be incomplete. Let c(x n ) 
denote the number of phrases in LZ incremental parsing of x n . The LZ complexity of x n is defined 

as 

Plz(x") ± C(x " )l0 n gc(xn) . (9) 

The finite-state compressibility, p(x), of the infinite sequence x = (x\,X2, ■ ■ ■) is defined, in [30], 
as the best compression ratio achieved by IL finite-state encoders, analogously to the above def- 
inition of finite-state encryptability. From Theorems 1, 2 and 3 of [30], it follows that plz{x) = 
limsup n ^ 00 pLz{x n ) is equal to p(x). 

The following theorem establishes a lower bound on a s (x n ) in terms of PLz{x n ) and hence a 
lower bound of a(x) in terms of p(x). 

Theorem 1 (Converse to a coding theorem): For every x n , 

cr s (x n ) > p LZ (x n ) - 5 s (n), (10) 
where 5 s (n) is independent of x n and behaves according to 

Consequently, a(x) > p(x). 
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Discussion. A few comments on Theorem 1 are in order. 

1. It is readily observed that a compatible direct theorem holds, simply by applying the LZ '78 
algorithm followed by one-time pad encryption of the compressed bits. The resulting key- 
rate needed is then upper bounded by ^[c(x n ) + 1] log[2a(c(x n ) + 1)], following [30, Theorem 
2], which is, within negligible terms, equal to pLz{x n ). Thus, a(x) = p(x). 

2. Consider the difference between the upper bound pertaining to the direct part (as mentioned 
in item no. 1 above) and the lower bound of the converse part. The behavior of this difference 
is 0(as log(log n) / v/Iog n) . This behavior is different from the behavior of the correspond- 
ing gap in compression (Theorems 1 and 2 in [30]), which is 0([log(2a)] log(8as 2 )/ log n). 
The guaranteed convergence to optimality is therefore considerably slower in the encryption 
problem. 

3. As will be seen in the proof of Theorem 1, a s (x n ) is first lower bounded in terms of the m-th 
order empirical entropy associated with x n (namely, the entropy associated with the relative 
frequency of non-overlapping m-blocks of x n ), where m is a large positive integer, and then 
this empirical entropy in turn is further lower bounded in terms of pLz(x n ). The reason for 
the latter passage is to get rid of the dependence of the main term of the lower bound on the 
parameter m, which is arbitrary. This also helps to select the optimum growth rate of m as 
a function of n. 

4. We already mentioned that the definition of the IL property here is somewhat more relaxed 
than in [30] (see footnote no. 2). Moreover, it is possible to relax this requirement even 
further by allowing a relatively small uncertainty in x n given (zi, k n , f(z\, x n , k n ), g(z±, x n )) 
(see Subsection 4.2), at the possible cost of further slowing down the convergence of 5 s {n). 

Proof. Let m divide n and consider the partition of x n into n/m non-overlapping m-vectors 

"El> <^2i • • • i -En/mi 

where Xi = x % ^_^ m+1 . Recall that for a given 2(i_i) m+ i and x i: the length Zj of 
ki = &(™i) m+ i is uniquely determined as h = A(zj, 1 - )m+1 ). Let us now define a joint empirical 
distribution of several variables. For every a m 6 X m , z,z' G Z, and every positive integer I, let 

n/m 

Px^zz'L(a m , z, z', l) = ~Yl H^™ i)m+i = a m , z {i _ 1)m+1 = z, z im+1 = z', A(z, a m ) = I}. (12) 

n i=i 
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Now, define 



P KmXmYmZZ , L (K m , a m , b m , z, z', I) = 2- l P XmZZ , L (a m , z, z', I) ■ l{b m = f(z, a m , n m )} (13) 

Throughout this proof, all information measures are defined w.r.t. PK m x m Y m zz'L- Consider the 
following chain of equalities for the given x n and an arbitrary encrypter E G £{s): 

£(k n ) 



v E {x n ) 



n 

^ n/m 

i=l 

n/m 
1 m ' 



-■-E^(4-iwi) 

H(K m \L) 



m 



(14) 



Note that the length of the key for the i-th m-block, U = l(ki) = A(z( i _ 1 ) m+1 , x l ™_ ^ m+1 ) = 
J2t=(i-i)m+i A(zt,xt), is a variable that may take on no more than (m + l)^- 1 different values, 3 
and hence the same is true concerning the random variable L, and so, H(L) < (as — 1) log(m + 1). 
Thus, 

a E (x n ) = —H(K m \L) 
m 

= —\H{K m )-I{K m -L)} 
m 

> l.[ H (K m ) - H(L)} 

> — \H(K m ) - (as- l)log(m + 1)]. (15) 



m 



Now, for all large m, 



H(K m ) > H(K m \Y m ) 

> I(K m ;X m \Y m ) 

= H(X m \Y m )- H(X m \Y m ,K r 

= H(X m ) - H(X m \Y m ,K m ) 



3 To see why this is true, observe that the sum that defines ij depends on x» = !C(71i) m +i an d z i = z (i-i) m +i 
via the joint type class of pairs (x, z) G X x Z, associated with Thus, the number of different values that h 

may take cannot exceed the total number of such type classes, which in turn is upper bounded by (m + l) as_1 . 
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> H{X m ) - H{X m \Y m , K m , Z, Z') - I(Z, Z'; X m \Y m , K m ) 
= H(X m )-0- I(Z,Z';X m \Y m ,K m ) 

> H(X m )-H(Z,Z'\Y m ,K m ) 

> H(X m )-2logs, (16) 

where the second equality is due to the perfect security assumption and the third equality is due 
to the IL property, assuming that m is sufficiently large. Thus, combining eqs. (15) and (16), we 
obtain 

>g!)^. (m . 1) . l °ri-^). (17) 

mm m 
Now, the main term, H(X m )/m, is nothing but the normalized m-th order empirical entropy 
associated with x n . Next, as discussed earlier, we further lower bound H{X m )/m in terms of 
PLz(x n ) at the (small) price of reducing the bound further by additional terms that will be shown 
later to be negligible. In particular, in the sequel, we prove the following inequality: 

H(X m ) c(x n )\ogc(x n ) 2m(logq + l) 2 2ma 2m loga 1 
m ~ n (1 — e ra )logn n m 

where e n — > as n — > oo. Combining this with eq. (17), we get 

/ ^ c(x n )\ogc{x n ) 
ve(x ) > o s (n,m) (19) 

where 

2 logs log(m + l) 2m(loga + l) 2 2ma 2m loga 1 

S s {n, m) = h {as - 1) h — -j 1 1 . (20) 

m m (1 — e n )logn n m 

We now have the freedom to let m = m n grow slowly enough as a function of n such that 



5 s (n) = 5 s (n,m n ) will vanish for every fixed s. By letting m n be proportional to yTogn, 5 s (n) 
becomes 0(s log(log n) /\/log n). Note that the first two terms of 5 s (n,m) come from considera- 
tions pertaining to encryption, whereas the other terms appear also in compression. The second 
term turns out to be the dominant one, which means that in the encryption problem we end 
up with slower decay of the redundancy. If we compare the difference between the upper bound 
and the lower bound in compression (coding them and converse in [30]), this difference is dom- 
inated by a term that is 0(([log(2ct)] log(8as 2 )/logn), whereas in encryption the difference is 
0(as log (log n) / \J\og n), namely, a significantly slower decay rate. 
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It remains then to establish eq. (18). To this end, let us first recall the analogous setup of 
lossless compression of individual sequences using finite-state machines [30] . A (/-state encoder C 
is defined by a quintuplet (£, B, X, f, g), where £ is the state set of size q, B is a finite set of binary 
words (possibly of different lengths, including the null word for idling), X is the finite alphabet of 
the source to be compressed, / : £ x y — > B is the encoder output function, and g : £ x y — > £ is the 
next-state function. When an input sequence {x\, X2, ■■■) is fed sequentially into C = (£, X, B, f, g), 
the encoder outputs a sequence of binary words (61, 62, ■•■)> &i £ while going through a sequence 
of states ((7i,cr2,...), according to 

bi = f(ai,Xi), a i+1 = g(ai,Xi), i = l,2, ... (21) 

where crj is the state of C at time instant z. A finite-state encoder C is said to be information 
lossless (IL) if for all <7j G £ and all a^ + ' J ' -1 £ j > 1, the triple (<7j, <Tj+j, 6) uniquely determines 
where cij+j and b = (6j, 6j +J _i) are obtained by iterating eq. (21) with initial state o~i 
and 2;* + "' _1 as input. The length function associated with C is defined as ic{x n ) = X^!=i^(M> 
where £(bi) is the length of the binary string bi € B. 

Consider the incremental parsing of x n and let c(x n ) be defined as above. According to [30, 
Theorem 1], for any g-state IL encoder and for every x n G X n , n > 1, 

£ C (x n )>[c(x n ) + q 2 ]log^p-. (22) 

Consider next the Shannon code, operating on x n by successively encoding its m-blocks, x\, X2,- ■ ., 
x n / m , using an arbitrary probability distribution Q. According to this code, Xi is encoded using 
[— log Q(xi)~\ bits, and so, its length function is given by 

n/m 
i=l 

= -]T^a m )[-logQ(a m )l 
< ^^p Xm ( a ™)[_i og Q( a -) + i] 

= --E P ^(« m )logQ(a m ) + -. (23) 

It is easy to see that this code can be implemented by a finite-state encoder in the following 
manner: At the beginning of each block (t mod m = 1), the encoder is always at some fixed initial 
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state do- At time instant t = (i — l)m + j, 1 < j < m, the state at is denned as a^Iij^+i 1 - 
The encoder outputs the null string whenever t mod m 7^ 1; when t mod m = 1, the encoder 
emits the Shannon codeword of the block just terminated. The total number of states is therefore 
q = Y7j=Q °<? = (a m — l)/(a— 1). It is also easy to see that the Shannon code is IL. For given positive 
integers i and j, suppose we are given cij, di+j, and (bi, bi + j-±). Then (xj, Xj+i, Xj +J _i) 

can be reconstructed as follows. If time instants i and i + j fall in the same m-block then (Jj + \ 
conveys full information on (xj, Xj-j-i, Xi+j—i ). Otherwise, we use the following procedure: The 
segment from time i until the end of the current block is reconstructed by decoding the codeword 
emitted at the end of this block. Similarly, if there are any additional blocks that are fully contained 
in the segment from i to i + j, they can also be reconstructed by decoding. Finally, the portion of 
the last block until position i + j — 1 can be recovered again from the final state. 

It now follows that the length function of the Shannon code must satisfy the lower bound (22) 
with q = q rn = (a m — l)/(a — 1) < a m , and so, 

--J2 P xAa m )logQ(a m ) + - > [c(x") + <4]logfp. (24) 



Since this holds for every Q while the right-hand side is independent of Q, we may minimize the 
left-hand side w.r.t. Q and obtain 

^H{X m ) + - > [cOO + g*Jlog^p 
m m 4q4 

> c(x") log c(x") - c(/) log(4^) - q 2 m log(4^) 

> c(x n ) log c(x n ) - c(x n ) log(4a 2m ) - a 2m log(4a 2m ) 

> c(x n )\ogc(x n ) - 2mc(x n )(l + loga) -2ma 2m (l + loga) 

> c(x w )logc(x w ) - 2mn(1 + loga ) 2 _ 2ma 2m {\ + loga), (25) 

(l-e n )logn 

where the last inequality follows from [30, eq. (6)]. Eq. (18) is now obtained by normalizing both 
sides by n. This completes the proof of Theorem 1. 



4 Extensions 

In this section, we extend Theorem 1 in two directions, availability of SI and lossy reconstruction. 
As described in the Introduction, we first consider each one of these directions separately, and then 
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jointly. 



4.1 Availability of Side Information 

Consider the case where SI is available at the encrypter/decrypter/eavesdropper. Suppose that, 
in addition to the source sequence x, there is an (individual) SI sequence s = (s±, S2, ■ ■ ■), Sj 6 5, 
i = l,2,..., where S is a finite alphabet. Let us assume first that all three parties (encoder, decoder, 
and eavesdropper) have access to s. In the formal model definition, a few modifications are needed: 

1. In eqs. (1), (3), and (4), the functions A, / and g should be allowed to depend on the 
additional argument Sj, 

2. The definition of perfect security should allow conditioning on s, in addition to the present 
conditioning on x. I.e., Pr{Y™ = y™\x,s} is independent of x for all positive integers n, m 
(but it is allowed to depend on s). 

3. In the definition of an IL encrypter, the quadruple (z,k n , f(z,x n ,k n ),g(z,x n )) should be 
extended to be the quintuple (z,k n ,s n , f(z,x n ,k n ),g(z,x n )). 

In Theorem 1, the LZ complexity of x n , should be replaced by the conditional LZ complex- 
ity of x n given s n , denoted PLz{x n \s n ), which is an empirical measure of conditional entropy 
(or conditional compressibility), that is defined as follows (see also [11], [28]): Given x n and 
s n , let us apply the incremental parsing procedure of the LZ algorithm to the sequence of pairs 
((xi, s±), (x2, S2), • • • , (x n , s n )). According to this procedure, all phrases are distinct with a possi- 
ble exception of the last phrase, which might be incomplete. Let c(x n ,s n ) denote the number of 
distinct phrases. For example, 4 if 

x e = I 1 I I 1| 
s 6 = I 1 I 1 I 1| 

then c(x 6 ,s 6 ) = 4. Let c(s n ) denote the resulting number of distinct phrases of s n , and let s(l) 
denote the Ith distinct s-phrase, I = 1, 2, c(s n ). In the above example, c(s 6 ) = 3. Denote by 
ci(x n \s n ) the number of occurrences of s(l) in the parsing of s n , or equivalently, the number of 
4 The same example appears in [28]. 
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c[ s ) 

distinct a>phrases that jointly appear with s(l). Clearly, J2i=i ci(x n \s n ) = c(x n , s n ). In the above 
example, a(l) = 0, s(2) = 1, s(3) = 01, Cl (x 6 |s 6 ) = c 2 (x 6 |s 6 ) = 1, and c 3 (x 6 |s 6 ) = 2. Now, the 
conditional LZ complexity of x n given s n is defined as 

c(«») 

PLz{x n \s n ) = -J2 Q(x n | S ")log Q (x"| S "). (26) 
n l=i 

The proof of Theorem 1 extends quite straightforwardly: The definition of Pic m x m Y m zz'L should be 
extended to PK m s m x m Y m zz'L in account of the empirical distribution that includes the m-blocks 
of s n . In (16), all the conditionings should include S m in addition to all existing conditionings, 
resulting in the inequality 

H(K m ) > H(X m \S m ) — 2 log s. (27) 

Finally, H(X m \S m ) is further lower bounded in terms of phz{x n \s n ) since the latter is essentially 
a lower bound on the the compression ratio of x n given s n using finite-state encoders (see [11, 
eq. (13)]). The direct is obtained by first, compressing x n to about n ■ pLz{x n \s n ) bits using the 
conditional parsing scheme [28, Lemma 2, eq. (A. 11)] and then applying one-time pad encryption. 

The same performance can be achieved even if the encrypter does not have access to s n , by using 
a scheme in the spirit of Slepian-Wolf coding: Randomly assign to each member of X n a bin, selected 
independently at random across the set {1,2,..., 2 nR }. The encrypter applies one-time pad to the 
(n.R)-bit binary representation of the bin index of x n . The decrypter, first decrypts the bin index 
using the key and then seeks a sequence x n within the given bin, which satisfies pLz{x n \s n ) < R — e. 
If there is one and only one such sequence, then it becomes the decoded message, otherwise an error 
is declared. This scheme works, just like the ordinary SW coding scheme, because the number of 
{x n } for which p LZ {x n \s n ) < R-e does not exceed 2 n [ /? - e +°( lo s( lo s n )/ lo g ri )] [28, Lemma 2]. The 
weakness of this is that prior knowledge of (a tight upper bound on) pLz{x n \s n ) is required. If, for 
example, it is known that x n is a noisy version of s n , generated, say, by a known additive channel, 
then R should be essentially the entropy rate of the noise. 

The case where the legitimate receiver and the eavesdropper have access to different Si's will 
be discussed in Subsection 4.3, where we also extend the scope to lossy reconstruction. 
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4.2 Lossy Reconstruction 



Suppose that we are content with a lossy reconstruction, x n , at the legitimate receiver. In general, 
this reconstruction may be a random vector due to possible dependence on the random key bits. 
It is required, however, that d(x n ,x n ) < nD with probability one, for some distortion measure d. 
Then, in Theorem 1, pLz{x n ) should be replaced by the "LZ rate-distortion function" of x n , which 
is defined as 

r LZ {D-x n )^ t min P L z{x n ). (28) 

In the proof of Theorem 1, the joint distribution P Km , x ^x m Y m zz'L snoiu d be defined as the expec- 
tation (w.r.t. the randomness of the key) of the m-th order empirical distribution extracted from 
the sequences (k n , x n , x n , y n ) and the resulting states and key lengths {ij}™^™. The 

definition of the IL property can be slightly relaxed to a notion of "nearly IL" (NIL) property, 
which allows recovery with small uncertainty for all large enough n. In particular, we shall assume 
that given w = (z h fc* +n , f( Zi , x? +i_1 , k^' 1 ), g( Zi , a£ +i_1 )), x™^" 1 must lie, with probability one, 
in a subset A n (w) C X n , where 5 

r> n = lim — logmax |Ai(w)l = 0. (29) 
n— >oc n w 

Perfect security should be defined as statistical independence between the cryptogram and both 
the source and reconstruction, i.e., the probability of any segment of {yi} should not depend on 
either x or x. 

In the proof of the converse part, in eq. (16), X m should be replaced by X m in all places, and 
we get 

H(K m ) > H(X m ) - 2 log s - mr] m , (30) 

as H(X m \Y m , K m , Z, Z')/m would be upper bounded by r\ m . Then, H(X m )/m is further lower 
bounded in terms of EpLz(x n ), essentially in the same way as before, where here we have also 
used the fact that, due to the concavity of the entropy functional, H(X m ) is lower bounded by the 
expected m-th order conditional empirical entropy pertaining to the realizations of x n . Finally, 



This might be the case if unambiguous reconstruction of a;™ +1_1 requires additional information from times later 
than t = n + i — 1. For example, if the encrypter works in blocks of fixed size m, x n is deterministic, and n S> m, then 
by viewing the block code as finite-state machine as before, there might be uncertainty in not more than the m last 
symbols of x n in case the last block is incomplete (e.g., when m does not divide n or the n-block considered is not 
synchronized to the m-blocks). In this case, |^4„(w)| < \X\ m , which is fixed, independent of n, and so r/„ — 0(l/n). 
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since we require d(x n ,x n ) < nD with probability one, then EpLz{x n ) is trivially further lower 
bounded by rLz(D;x n ). 

Again, the direct is obvious, and it implies that at least asymptotically, there is nothing to 
gain from randomizing the reconstruction: The best choice of x n is the one with minimum LZ 
complexity within the sphere of radius nD around x n . This conclusion is not obvious a-priori as 
one might speculate that a randomized reconstruction, depending on the key, may potentially be 
more secure than a deterministic one. 

Note that we have not assumed anything on the distortion measure d, not even additivity. 
Another difference between Theorem 1 of the lossless case and its present extension to the lossy 
case, is that we are know longer able to characterize the rate of convergence of <5 s (n), as it depends 
on the rate of decay of r\ m . In fact, we could have replaced the IL property we assumed in the 
lossless case by the NIL property there too, but again, the cost would be the loss the ability to 
specify the behavior of S n . 

4.3 Lossy Reconstruction With Side Information 

The simultaneous extension of Theorem 1, allowing both distortion D and SI s n leads, with the 
obvious modifications, to mm{pLz(x n \s n ) : d(x n ,x n ) < nD}, whose achievability is conceptu- 
ally straightforward when all parties have access to s n , including the encrypter. But what if the 
encrypter does not have access to s ra ? 

In this case, there is no longer an apparent way to characterize the minimum key rate that must 
be consumed in terms of LZ complexities. This should not be surprising in view of the fact that 
even in the less involved problem of pure lossy compression of individual sequences with SI available 
at the decoder, performance is no longer characterized in terms of the LZ complexity (see, e.g., [13] 
and references therein). Similarly as in [13], here we are able to give a certain characterization for 
the case where the decrypter is also modeled as an FSM. While our performance characterization 
may not seem very explicit, the main message behind it (like in [13]) is that the performance of the 
best s-state encrypter-decrypter can be achieved by block codes of length m within a redundancy 
term that decays as m — > oo for every fixed s. 

Referring to our definition of the finite-state encrypter in Section 2, we model the decrypter as 
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a device that implements the following recursive equations: 

4+1 = 9'{4,Vi, s i) 2 = 1,2,... (31) 
Xi-r = f{z'iiyii s iiki) i = t + l,r + 2, ... (32) 

where r (non-negative integer) is the encoding-decoding delay and z\ £ Z is the state of the 
decrypter at time t. We also model the channel from x n to s n as a memoryless channel 

n 

P S n\ X n(s n \x n ) = '[[P S \ X (8 i \x i ). (33) 
1=1 

We argue that the minimum key rate consumed by any finite-state encrypter-decrypter with s 
states and delay r is lower bounded by r s (D + rd max /m), where d max = max x ^. d(x, x) is assumed 
finite and where r s (D) is defined as the minimum of H{K m \L)/m = — X^>i I " ° ver a h 

random variables (F, W, L) such that: (i) the support of Pl is of size (m + 1) QS_1 , (ii) X m — > 
S m -+ Y is a Markov chain (perfect security), (iii) min^ E{d(X m , h(W, L, U L , S m , Y)) < mD, (iv) 
(W, L) — > X m — > S m is a Markov chain and Y = g(W, L, X m , U L ) for some deterministic function 
g, (v) the alphabet size of W is s 2 , and (vi) the alphabet size of Y is the minimum needed (by the 
Caratheodory theorem) in order to maintain (i)-(v). 

Consider again the partition of a block of length n into n/m non-overlapping blocks, each 
of length m, along with the induced joint empirical distribution PK m x m Y m zz'L defined as before 
except that now Z' is the random variable that designates the relative frequency of the state of the 
decrypter z' t at times t = im + 1, i = 1, 2, . . . , n/m. Next, define 

PK m S m X m Y m ZZ'L = PK m X m Y m ZZ'L X Ps m \X m - (34) 

First, observe that ylm+T depends (deterministically) only on Zj m +i, and Similarly 

xZ'-lXT depends only on fcj£+™, C+T, iC+T, z 'im+i and *■ L «t us denote then 



im-r+m _ i i im+m uim+m\ 

ilim-T+1 — y\*im+l, h x j m +l ' Wl / 

and 

«!tn-r+m _ u( J / j.im+m im+m im+m\ 

^im-r+l ~ n \Hm+\i h ^im+1 i b im+l > wim+l /' 

Assuming that m > r, let x^;[ +m = fc'(4n+i, *, Ctf 1 , C+T> iCtf 1 ), be defined simply by 
truncating the first r components of h(z' im+1 ,l,kl^XT^ s im^T^yim+T)- Next, extend the definition 
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of PR m s m x m Y m zz'L to P KmLX ™x m - d Y™s m z ^ defining P x ^-r\K^LS m x m Y m Z'L as a degenerate 
PMF that puts all its mass on X m - T = ti(Z> , L, K m , S m ,Y m ) = h! \Z' , L,U L , S m ,Y m ). Now 
observe that eq. (34) implies that (Y m , Z, Z',L) — > X m — > S m is a Markov chain. For the purpose 
of obtaining a lower bound on the performance of finite-state encryption-decryption systems on the 
consumed key rate, it is legitimate to let g include dependence on Z' and to let h include dependence 
on Z in addition to their dependencies on the other variables involved. By doing so, the random 
variables Z and Z' appear together in all relevant places of the characterization and thus, we can 
define W = (Z, Z') which is a random variable whose alphabet size is s 2 . The (variable-length) 
string Y m can be replaced by a single random variable Y with the suitable alphabet size as defined 
above. 

As for the distortion, we have 

D > -E{d(x n ,X n )} 
n 

> —E{d(X m ~ T ,X m ~ T )} 
m 

> ±[E{d(X m ,X m )}-T.d max } (35) 

where X m is defined by concatenating X m ~ T with a random r-vector in X T that is an arbitrary 
function of (Z' , L,U L , S m ,Y rn ) (or (W, L,U L , S m ,Y m )). Thus, the minimum required key rate, 
H(K m \L)/m, of any s-state encrypter-decrypter cannot be smaller than r s (D + r • d ma _ x /m) by 
definition. 

We can achieve this performance by block codes as follows. For a given empirical distribution 
of X m and SI channel Ps\x: fi n d the optimum distribution conditional distribution P\vL\x m i the- 
encrypter g and the decrypter h that achieve r s (D). For every Xi, i = 1,2,... ,n/m, apply the 
channel P\vL\x m to generate wi and k given Xi, and then compute y% = g(wi,li,Xi,u li ). Next, 
transmit yi plus one-time pad encrypted versions of wi and U (to avoid any leakage of information 
concerning Xi via these random variables). These encryptions of Wi and hi require extra key rates 
of (21ogs)/m and (as — 1) (log m)/m, respectively. The information concerning the optimum h 
should be transmitted to the decrypter once in an n-block. Its one-time pad encryption requires 
additional key rate given by the description length of h, which depends only on m (as well as the 
alphabet sizes), normalized by n, and hence it is negligible when n 3> m. The decrypter simply 
applies the decoding function h and outputs the reconstruction. 
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Finally, note that if the eavesdropper has another version of the side information sequence, say, 
s n (generated from x n by another known memoryless channel Pg\x)> everything remains the same 
except that the perfect security requirement (ii) is replaced by X m — > S m — > Y. 
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